A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

نویسنده

Michael Kearns

چکیده

We give an analysis of the generalization error of cross validation in terms of two natural measures of the difficulty of the problem under consideration: the approximation rate (the accuracy to which the target function can be ideally approximated as a function of the number of hypothesis parameters), and the estimation rate (the deviation between the training and generalization errors as a function of the number of hypothesis parameters). The approximation rate captures the complexity of the target function with respect to the hypothesis model, and the estimation rate captures the extent to which the hypothesis model suffers from overfitting. Using these two measures, we give a rigorous and general bound on the error of cross validation. The bound clearly shows the tradeoffs involved with making — the fraction of data saved for testing — too large or too small. By optimizing the bound with respect to , we then argue (through a combination of formal analysis, plotting, and controlled experimentation) that the following qualitative properties of cross validation behavior should be quite robust to significant changes in the underlying model selection problem: When the target function complexity is small compared to the sample size, the performance of cross validation is relatively insensitive to the choice of . The importance of choosing optimally increases, and the optimal value for decreases, as the target function becomes more complex relative to the sample size. There is nevertheless a single fixed value for that works nearly optimally for a wide range of target function complexity. Category: Learning Theory. Prefer oral presentation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimation of LOS Rates for Target Tracking Problems using EKF and UKF Algorithms- a Comparative Study

One of the most important problem in target tracking is Line Of Sight (LOS) rate estimation for using from PN (proportional navigation) guidance law. This paper deals on estimation of position and LOS rates of target with respect to the pursuer from available noisy RF seeker and tracker measurements. Due to many important for exact estimation on tracking problems must target position and Line O...

متن کامل

Application of the MoDrY model for the estimation of potato yielding

The study was conducted with the application of the model MoDrY (Model-Dry periods-Yield) for the estimation of the level of potato yields on the basis of dry periods occurring during the particular periods between the phenological phases of the crop plant. A characteristic feature of this model, unlike most existing weatheryield models, is that the principle of its operation is based only ...

متن کامل

Nonlinear disjunctive kriging for the estimating and modeling of a vein copper deposit

ABSTRACT Estimation of mineral resources and reserves with low values of error is essential in mineral exploration. The aim of this study is to estimate and model a vein type deposit using disjunctive kriging method. Disjunctive Kriging (DK) as an appropriate nonlinear estimation method has been used for estimation of Cu values. For estimation of Cu values and modelling of the distributio...

متن کامل

Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation

In this paper a fast method for large-scale sparse inversion of magnetic data is considered. The L1-norm stabilizer is used to generate models with sharp and distinct interfaces. To deal with the non-linearity introduced by the L1-norm, a model-space iteratively reweighted least squares algorithm is used. The original model matrix is factorized using the Golub-Kahan bidiagonalization that proje...

متن کامل

On Optimal Data Split for Generalization Estimation and Model Selection

Modeling with flexible models, such as neural networks, requires careful control of the model complexity and generalization ability of the resulting model. Whereas general asymptotic estimators of generalization ability have been developed over recent years (e.g., [9]), it is widely acknowledged that in most modeling scenarios there isn't sufficient data available to reliably use these estimato...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Neural Computation

دوره 9 شماره

صفحات -

تاریخ انتشار 1995

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

نویسنده

چکیده

منابع مشابه

Estimation of LOS Rates for Target Tracking Problems using EKF and UKF Algorithms- a Comparative Study

Application of the MoDrY model for the estimation of potato yielding

Nonlinear disjunctive kriging for the estimating and modeling of a vein copper deposit

Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation

On Optimal Data Split for Generalization Estimation and Model Selection

عنوان ژورنال:

اشتراک گذاری